Overview

Dataset statistics

Number of variables13
Number of observations889
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory90.4 KiB
Average record size in memory104.1 B

Variable types

Numeric6
Categorical7

Warnings

Name has a high cardinality: 889 distinct values High cardinality
Ticket has a high cardinality: 680 distinct values High cardinality
df_index is highly correlated with PassengerIdHigh correlation
PassengerId is highly correlated with df_indexHigh correlation
df_index is uniformly distributed Uniform
PassengerId is uniformly distributed Uniform
Name is uniformly distributed Uniform
Ticket is uniformly distributed Uniform
df_index has unique values Unique
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 606 (68.2%) zeros Zeros
Parch has 676 (76.0%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2021-03-18 14:49:53.480986
Analysis finished2021-03-18 14:50:04.272909
Duration10.79 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct889
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean445
Minimum0
Maximum890
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile44.4
Q1223
median445
Q3667
95-th percentile845.6
Maximum890
Range890
Interquartile range (IQR)444

Descriptive statistics

Standard deviation256.9981728
Coefficient of variation (CV)0.5775239838
Kurtosis-1.197156422
Mean445
Median Absolute Deviation (MAD)222
Skewness0
Sum395605
Variance66048.06081
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
5981
 
0.1%
5871
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
Other values (879)879
98.9%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
ValueCountFrequency (%)
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%

PassengerId
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct889
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum1
5-th percentile45.4
Q1224
median446
Q3668
95-th percentile846.6
Maximum891
Range890
Interquartile range (IQR)444

Descriptive statistics

Standard deviation256.9981728
Coefficient of variation (CV)0.5762290869
Kurtosis-1.197156422
Mean446
Median Absolute Deviation (MAD)222
Skewness0
Sum396494
Variance66048.06081
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
5991
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
5951
 
0.1%
Other values (879)879
98.9%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%

Survived
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549 
1
340 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0
ValueCountFrequency (%)
0549
61.8%
1340
38.2%
Histogram of lengths of the category
ValueCountFrequency (%)
0549
61.8%
1340
38.2%

Most occurring characters

ValueCountFrequency (%)
0549
61.8%
1340
38.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number889
100.0%

Most frequent character per category

ValueCountFrequency (%)
0549
61.8%
1340
38.2%

Most occurring scripts

ValueCountFrequency (%)
Common889
100.0%

Most frequent character per script

ValueCountFrequency (%)
0549
61.8%
1340
38.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ValueCountFrequency (%)
0549
61.8%
1340
38.2%

Pclass
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491 
1
214 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3
ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%
Histogram of lengths of the category
ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number889
100.0%

Most frequent character per category

ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Common889
100.0%

Most frequent character per script

ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ValueCountFrequency (%)
3491
55.2%
1214
24.1%
2184
 
20.7%

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct889
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Perreault, Miss. Anne
 
1
Montvila, Rev. Juozas
 
1
Moran, Mr. Daniel J
 
1
Ryerson, Miss. Susan Parker "Suzette"
 
1
Harris, Mr. Walter
 
1
Other values (884)
884 

Length

Max length82
Median length25
Mean length26.9583802
Min length12

Characters and Unicode

Total characters23966
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique889 ?
Unique (%)100.0%

Sample

1st rowBraund, Mr. Owen Harris
2nd rowCumings, Mrs. John Bradley (Florence Briggs Thayer)
3rd rowHeikkinen, Miss. Laina
4th rowFutrelle, Mrs. Jacques Heath (Lily May Peel)
5th rowAllen, Mr. William Henry
ValueCountFrequency (%)
Perreault, Miss. Anne1
 
0.1%
Montvila, Rev. Juozas1
 
0.1%
Moran, Mr. Daniel J1
 
0.1%
Ryerson, Miss. Susan Parker "Suzette"1
 
0.1%
Harris, Mr. Walter1
 
0.1%
Sutehall, Mr. Henry Jr1
 
0.1%
Smiljanic, Mr. Mile1
 
0.1%
Morrow, Mr. Thomas Rowan1
 
0.1%
Danoff, Mr. Yoto1
 
0.1%
Douglas, Mr. Walter Donald1
 
0.1%
Other values (879)879
98.9%
Histogram of lengths of the category
ValueCountFrequency (%)
mr521
 
14.4%
miss181
 
5.0%
mrs128
 
3.5%
william64
 
1.8%
john44
 
1.2%
master40
 
1.1%
henry35
 
1.0%
james24
 
0.7%
george23
 
0.6%
charles23
 
0.6%
Other values (1512)2532
70.0%

Most occurring characters

ValueCountFrequency (%)
2728
 
11.4%
r1954
 
8.2%
e1696
 
7.1%
a1654
 
6.9%
i1323
 
5.5%
n1301
 
5.4%
s1293
 
5.4%
M1125
 
4.7%
l1064
 
4.4%
o1005
 
4.2%
Other values (50)8823
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter15408
64.3%
Uppercase Letter3636
 
15.2%
Space Separator2728
 
11.4%
Other Punctuation1895
 
7.9%
Open Punctuation143
 
0.6%
Close Punctuation143
 
0.6%
Dash Punctuation13
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
r1954
12.7%
e1696
11.0%
a1654
10.7%
i1323
8.6%
n1301
8.4%
s1293
8.4%
l1064
 
6.9%
o1005
 
6.5%
t665
 
4.3%
h516
 
3.3%
Other values (16)2937
19.1%
ValueCountFrequency (%)
M1125
30.9%
A249
 
6.8%
J215
 
5.9%
H203
 
5.6%
S179
 
4.9%
C172
 
4.7%
E165
 
4.5%
W143
 
3.9%
B140
 
3.9%
L129
 
3.5%
Other values (15)916
25.2%
ValueCountFrequency (%)
.890
47.0%
,889
46.9%
"106
 
5.6%
'9
 
0.5%
/1
 
0.1%
ValueCountFrequency (%)
2728
100.0%
ValueCountFrequency (%)
(143
100.0%
ValueCountFrequency (%)
)143
100.0%
ValueCountFrequency (%)
-13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin19044
79.5%
Common4922
 
20.5%

Most frequent character per script

ValueCountFrequency (%)
r1954
 
10.3%
e1696
 
8.9%
a1654
 
8.7%
i1323
 
6.9%
n1301
 
6.8%
s1293
 
6.8%
M1125
 
5.9%
l1064
 
5.6%
o1005
 
5.3%
t665
 
3.5%
Other values (41)5964
31.3%
ValueCountFrequency (%)
2728
55.4%
.890
 
18.1%
,889
 
18.1%
(143
 
2.9%
)143
 
2.9%
"106
 
2.2%
-13
 
0.3%
'9
 
0.2%
/1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII23966
100.0%

Most frequent character per block

ValueCountFrequency (%)
2728
 
11.4%
r1954
 
8.2%
e1696
 
7.1%
a1654
 
6.9%
i1323
 
5.5%
n1301
 
5.4%
s1293
 
5.4%
M1125
 
4.7%
l1064
 
4.4%
o1005
 
4.2%
Other values (50)8823
36.8%

Sex
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
312 

Length

Max length6
Median length4
Mean length4.701912261
Min length4

Characters and Unicode

Total characters4180
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale
ValueCountFrequency (%)
male577
64.9%
female312
35.1%
Histogram of lengths of the category
ValueCountFrequency (%)
male577
64.9%
female312
35.1%

Most occurring characters

ValueCountFrequency (%)
e1201
28.7%
m889
21.3%
a889
21.3%
l889
21.3%
f312
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4180
100.0%

Most frequent character per category

ValueCountFrequency (%)
e1201
28.7%
m889
21.3%
a889
21.3%
l889
21.3%
f312
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin4180
100.0%

Most frequent character per script

ValueCountFrequency (%)
e1201
28.7%
m889
21.3%
a889
21.3%
l889
21.3%
f312
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4180
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1201
28.7%
m889
21.3%
a889
21.3%
l889
21.3%
f312
 
7.5%

Age
Real number (ℝ≥0)

Distinct88
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.31515186
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0.42
5-th percentile6
Q122
median28
Q335
95-th percentile54
Maximum80
Range79.58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation12.98493229
Coefficient of variation (CV)0.4429426925
Kurtosis1.007819813
Mean29.31515186
Median Absolute Deviation (MAD)6
Skewness0.5080100783
Sum26061.17
Variance168.6084667
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28202
22.7%
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
1925
 
2.8%
3025
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
2920
 
2.2%
Other values (78)465
52.3%
ValueCountFrequency (%)
0.421
0.1%
0.671
0.1%
0.752
0.2%
0.832
0.2%
0.921
0.1%
ValueCountFrequency (%)
801
0.1%
741
0.1%
712
0.2%
70.51
0.1%
702
0.2%

SibSp
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5241844769
Minimum0
Maximum8
Zeros606
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.103704876
Coefficient of variation (CV)2.105565739
Kurtosis17.83897238
Mean0.5241844769
Median Absolute Deviation (MAD)0
Skewness3.691057631
Sum466
Variance1.218164452
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0606
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0606
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
2.0%
316
1.8%
228
3.1%

Parch
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3824521935
Minimum0
Maximum6
Zeros676
Zeros (%)76.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8067607445
Coefficient of variation (CV)2.109442064
Kurtosis9.750591706
Mean0.3824521935
Median Absolute Deviation (MAD)0
Skewness2.745160126
Sum340
Variance0.6508628989
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0676
76.0%
1118
 
13.3%
280
 
9.0%
35
 
0.6%
55
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0676
76.0%
1118
 
13.3%
280
 
9.0%
35
 
0.6%
44
 
0.4%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
9.0%

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct680
Distinct (%)76.5%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
347082
 
7
CA. 2343
 
7
1601
 
7
CA 2144
 
6
3101295
 
6
Other values (675)
856 

Length

Max length18
Median length6
Mean length6.752530934
Min length3

Characters and Unicode

Total characters6003
Distinct characters35
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique547 ?
Unique (%)61.5%

Sample

1st rowA/5 21171
2nd rowPC 17599
3rd rowSTON/O2. 3101282
4th row113803
5th row373450
ValueCountFrequency (%)
3470827
 
0.8%
CA. 23437
 
0.8%
16017
 
0.8%
CA 21446
 
0.7%
31012956
 
0.7%
3470886
 
0.7%
3826525
 
0.6%
S.O.C. 148795
 
0.6%
3499094
 
0.4%
1137604
 
0.4%
Other values (670)832
93.6%
Histogram of lengths of the category
ValueCountFrequency (%)
pc60
 
5.3%
c.a27
 
2.4%
a/517
 
1.5%
ca14
 
1.2%
212
 
1.1%
ston/o12
 
1.1%
w./c9
 
0.8%
sc/paris9
 
0.8%
soton/o.q8
 
0.7%
3470827
 
0.6%
Other values (708)953
84.5%

Most occurring characters

ValueCountFrequency (%)
3744
12.4%
1685
11.4%
2592
9.9%
7488
8.1%
4464
 
7.7%
6422
 
7.0%
0406
 
6.8%
5385
 
6.4%
9328
 
5.5%
8282
 
4.7%
Other values (25)1207
20.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4796
79.9%
Uppercase Letter652
 
10.9%
Other Punctuation295
 
4.9%
Space Separator239
 
4.0%
Lowercase Letter21
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
C151
23.2%
O100
15.3%
P98
15.0%
A82
12.6%
S74
11.3%
N40
 
6.1%
T36
 
5.5%
W16
 
2.5%
Q15
 
2.3%
I11
 
1.7%
Other values (6)29
 
4.4%
ValueCountFrequency (%)
3744
15.5%
1685
14.3%
2592
12.3%
7488
10.2%
4464
9.7%
6422
8.8%
0406
8.5%
5385
8.0%
9328
6.8%
8282
 
5.9%
ValueCountFrequency (%)
a6
28.6%
s5
23.8%
r4
19.0%
i4
19.0%
l1
 
4.8%
e1
 
4.8%
ValueCountFrequency (%)
.197
66.8%
/98
33.2%
ValueCountFrequency (%)
239
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5330
88.8%
Latin673
 
11.2%

Most frequent character per script

ValueCountFrequency (%)
C151
22.4%
O100
14.9%
P98
14.6%
A82
12.2%
S74
11.0%
N40
 
5.9%
T36
 
5.3%
W16
 
2.4%
Q15
 
2.2%
I11
 
1.6%
Other values (12)50
 
7.4%
ValueCountFrequency (%)
3744
14.0%
1685
12.9%
2592
11.1%
7488
9.2%
4464
8.7%
6422
7.9%
0406
7.6%
5385
7.2%
9328
6.2%
8282
 
5.3%
Other values (3)534
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6003
100.0%

Most frequent character per block

ValueCountFrequency (%)
3744
12.4%
1685
11.4%
2592
9.9%
7488
8.1%
4464
 
7.7%
6422
 
7.0%
0406
 
6.8%
5385
 
6.4%
9328
 
5.5%
8282
 
4.7%
Other values (25)1207
20.1%

Fare
Real number (ℝ≥0)

ZEROS

Distinct247
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.09668088
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size7.1 KiB

Quantile statistics

Minimum0
5-th percentile7.225
Q17.8958
median14.4542
Q331
95-th percentile112.31832
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.1042

Descriptive statistics

Standard deviation49.69750432
Coefficient of variation (CV)1.548368958
Kurtosis33.50847727
Mean32.09668088
Median Absolute Deviation (MAD)6.9042
Skewness4.801440211
Sum28533.9493
Variance2469.841935
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
7.229215
 
1.7%
26.5515
 
1.7%
Other values (237)613
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%

Embarked
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%
Histogram of lengths of the category
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter889
100.0%

Most frequent character per category

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin889
100.0%

Most frequent character per script

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Age Group
Categorical

Distinct5
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Young
428 
Adult
323 
Kid
69 
Teenager
44 
Old
 
25

Length

Max length9
Median length5
Mean length4.986501687
Min length3

Characters and Unicode

Total characters4433
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYoung
2nd rowAdult
3rd rowYoung
4th rowAdult
5th rowAdult
ValueCountFrequency (%)
Young428
48.1%
Adult323
36.3%
Kid69
 
7.8%
Teenager44
 
4.9%
Old25
 
2.8%
Histogram of lengths of the category
ValueCountFrequency (%)
young428
48.1%
adult323
36.3%
kid69
 
7.8%
teenager44
 
4.9%
old25
 
2.8%

Most occurring characters

ValueCountFrequency (%)
u751
16.9%
n472
10.6%
g472
10.6%
Y428
9.7%
o428
9.7%
d417
9.4%
l348
7.9%
A323
7.3%
t323
7.3%
e132
 
3.0%
Other values (7)339
7.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3500
79.0%
Uppercase Letter889
 
20.1%
Space Separator44
 
1.0%

Most frequent character per category

ValueCountFrequency (%)
u751
21.5%
n472
13.5%
g472
13.5%
o428
12.2%
d417
11.9%
l348
9.9%
t323
9.2%
e132
 
3.8%
i69
 
2.0%
a44
 
1.3%
ValueCountFrequency (%)
Y428
48.1%
A323
36.3%
K69
 
7.8%
T44
 
4.9%
O25
 
2.8%
ValueCountFrequency (%)
44
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4389
99.0%
Common44
 
1.0%

Most frequent character per script

ValueCountFrequency (%)
u751
17.1%
n472
10.8%
g472
10.8%
Y428
9.8%
o428
9.8%
d417
9.5%
l348
7.9%
A323
7.4%
t323
7.4%
e132
 
3.0%
Other values (6)295
 
6.7%
ValueCountFrequency (%)
44
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4433
100.0%

Most frequent character per block

ValueCountFrequency (%)
u751
16.9%
n472
10.6%
g472
10.6%
Y428
9.7%
o428
9.7%
d417
9.4%
l348
7.9%
A323
7.3%
t323
7.3%
e132
 
3.0%
Other values (7)339
7.6%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbarkedAge Group
00103Braund, Mr. Owen Harrismale22.010A/5 211717.2500SYoung
11211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833CAdult
22313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250SYoung
33411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000SAdult
44503Allen, Mr. William Henrymale35.0003734508.0500SAdult
55603Moran, Mr. Jamesmale28.0003308778.4583QYoung
66701McCarthy, Mr. Timothy Jmale54.0001746351.8625SAdult
77803Palsson, Master. Gosta Leonardmale2.03134990921.0750SKid
88913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333SYoung
991012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708CTeenager

Last rows

df_indexPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbarkedAge Group
87988188203Markun, Mr. Johannmale33.0003492577.8958SAdult
88088288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167SYoung
88188388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000SYoung
88288488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500SYoung
88388588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250QAdult
88488688702Montvila, Rev. Juozasmale27.00021153613.0000SYoung
88588788811Graham, Miss. Margaret Edithfemale19.00011205330.0000SYoung
88688888903Johnston, Miss. Catherine Helen "Carrie"female28.012W./C. 660723.4500SYoung
88788989011Behr, Mr. Karl Howellmale26.00011136930.0000CYoung
88889089103Dooley, Mr. Patrickmale32.0003703767.7500QAdult